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ABSTRACT 



We have constructed a set of supercluster catalogues for the galaxies from the SDSS survey main and luminous red galaxy (LRG) 
flux-limited samples. To delineate superclusters, we calculated luminosity density fields using the 63-spline kernel of the radius of 
8 /i~'Mpc for the main sample and 16 /r'Mpc for the LRG sample and define regions with densities over a selected threshold as 
superclusters, while utilising almost the whole volume of both samples. We created two types of catalogues, one with an adaptive 
local threshold and a set of catalogues with different global thresholds. We describe the supercluster catalogues and their general 
properties. Using smoothed bootstrap, we find uncertainty estimates for the density field and use these to attribute confidence levels 
to the catalogue objects. We have also created a test catalogue for the galaxies from the Millennium simulation to compare the 
simulated and observed superclusters and to clarify the methods we use. We find that the superclusters are well-defined systems, and 
the properties of the superclusters of the main and LRG samples are similar. We also show that with adaptive local thresholds we get a 
sample of superclusters, the properties of which do not depend on their distance from the observer. The Millennium galaxy catalogue 
superclusters are similar to those observed. 

Key words, cosmology: large-scale structure of the Universe - galaxies: clusters: general 



1. Introduction 

The large-scale structure of the galaxy distribution is charac- 
terised by large voids and by a complex web of galaxy filaments 
and clusters. Superclusters are the largest components of the cos- 
mic web. They are collections of galaxies and galaxy clusters, 
with typical sizes of 20-100 /z~'Mpc. They can contain up to 
hundreds of galaxy groups and several rich clust ers. The first de- 
scribe d supercluster is the Local Supercluster dde Vaucouleursl 
1 1953b . and many other superclusters have been found and stud- 
ied in our neighbourhood. 

Astronomers have a long tradition of selecting galaxy clus- 
ters and groups from this web, but quantifying the overall web 
is a much more difficult task. This can be done in several 
ways, all of them computer-intensive and based on the prop- 
erties of a smoothed galaxy density field. Good recent exam- 
ples are the applica t ion of the multiscale morphology filter by 
Arag on-Calvo et afl (120101) and the Bayesian inference for the 
density and the sub sequent classification of the web elements by 
iJasche et al.l ([2010). These articles contain an exhaustive set of 
references. In this approach, the different sets of web compo- 
nents differ mainly in their dimensionality (clusters, filaments, 
sheets, and voids). Another approach that has been used is to di- 
vide the observed weblike galaxy distribution into its main build- 
ing blocks - superclusters. Superclusters are frequently treated 
in a similar way to groups and clusters of galaxies - they are 
density enhancements in the overall galaxy distribution. 

This approach leads to the construction of supercluster 
catalogues on the basis of both Abell clusters dEinasto et alJ 



I1997L 120011) and galaxy groups dEinasto et alJ 120071) . using 
smoothe d density fields. A simila r me thod has recent l y bee n 
used bv ICosta-Duarte et alJ d201 ll) and iLuparello et al l d201 lb . 
The friends-of-friends method was used by Basilakosf(2003) to 
compile superclusters from the SDSS sample. 

Supercluster catalogues are similar to other astronomical cat- 
alogues, because while serving as a basis to describe and study 
classes of objects, they are also essential for further work. This 
includes planning observational projects, comparing different 
classes of astronomical objects, and comparing theory (simula- 
tions) with observations. We present here the supercluster cat- 
alogues based on the richest existing redshift survey, the SDSS 
DR7. These catalogues have already been used for several stud- 
ies. The list includes a study of the locations of q uasars within 
the la rge-scale structure delineated by galaxies dLietzen et al.1 
2009), a couple of observing proposals to search for the warm- 
hot intergalactic medium, and a morphologi cal study of the ric h 
superclusters forming the Sloan Great Wall (Einasto et al. 2010). 
This catalogue has also been used for a preliminary identification 
of a Sunyaev -Zel'dovich (SZ) source seen in the early Planck 
mission data (Planck Collaboration et al. 201 1). 

The paper is organised as follows. In Sect.|2]we describe our 
method (beginning with the calculation of the density field), out- 
line supercluster delineation principles, and explain how some of 
the more important properties of the superclusters are calculated. 
In this section we also address the errors of the density field es- 
timates. In Sect. [3] we describe the datasets used. Supercluster 
properties are described in Sect. |H where we also compare 
different samples. The resulting catalogue can be downloaded 
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from: http:// atmos . phys ic .ut . ee/~ju han/super/ with a 
complete description in the readme files. We will also upload 
selected parts of the catalogues (listed in Appendix Q to the 
Strasbourg Astronomical Data Center (CDSU 

2. Delineating superclusters by the luminosity 
density field 

We define superclusters on the basis of their total density that 
is dominated by dark matter. Supposing that the bias (the ratio 
of the dark matter density to the stellar density) is approximately 
constant on supercluster scales, the observational counterpart for 
the total density is the luminosity density. We do not use clusters 
or groups to creat e the density field, as done f or earlier superclus- 
ter catalogues by Einast oet al.l (120031 |2007), but the full galaxy 
distribution. Before calculating the density field we processed 
the galaxy data to reduce several observational selection effects. 
The galaxy and group samples we used are described in Sect. [3] 

2.1. Distance and luminosity corrections for the SDSS main 
sample 

The spectroscopic galaxy samples (as the SDSS) are affected 
by the cluster-finger redshift distortions (the fingers-of-god). To 
suppress the cluster-finger redshift distortions, we use the rms 
siz es of galaxy group s and their radial velocity dispersions from 
the lTago et al.l (l2010i) galaxy gr oup catalogue. In this catalogue, 
the comoving distances (see e.g. lMartfnez & Saari2002l) are used 
for galaxies and groups, in units of h~ l Mpc. For groups with 
three or more members, we divide the radial distances between 
the group galaxies and group centres (d group ) by the ratio of the 
standard deviations cr r /o" v . This will remove the smudging of the 
density field by the cluster fingers. The corrected galaxy distance 
afgai is found as 



^group + C^gal 



group 



<r r 



O-v/tfo 



(1) 



where d*^ is the initial distance of the galaxy, cr r the standard 
deviation of the projected distance in the sky from the group 
centre, cr v the standard deviation of the radial velocity (both in 
physical coordinates at the group location), and the Hubble con- 
stant H = 100/z km s -1 Mpc~'. 

We use a cartesian grid based on the SDSS angular coor- 
dinates j] and A, because it allows the most efficient placing of 
the galaxy sample cone inside a box. The galaxy coordinates are 
calculated as follows: 

x — — afgai sin A, 

y = d ga i cos A COS 7], (2) 
Z = d ga i cos A sin r/. 

To compensate for selection effects and to ensure that the 
reconstructed density field does not depend on the distance, we 
have to take the luminosities of the galaxies into account that 
drop ou t of the survey magn itude window. We follow the proce- 
dure bv lTempel et alJ d201 lb and consider every galaxy as a visi- 
ble member of a density enhancement (a group or cluster) within 
the visibility range at the distance of the galaxy. We estimate the 
amount of unobserved luminosity and weigh each galaxy as 

igal,w = W L (d) L ga i, (3) 

1 Supercluster tables will be available at the CDS via 
anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via 
http : //cdsweb . u- strasbg . fr/cgi-bin/qcat?J/A+A/ 



where L ga \ = L o 10 (X4(M °~ M) is the observed luminosity of a 
galaxy with the absolute magnitude M, and M Q is the abso- 
lute magnitude of the Sun. The quantity Wi{d) is the distance- 
dependent weight factor: the ratio of the expected total luminos- 
ity to the luminosity within the visibility window: 



W L (d) 



/ o °°L0(L)dL 



(4) 



where Li^(d) are the luminosity limits corresponding to the sur- 
vey magnitude limits Mi .2 at the distance d. 

We approximate the luminosity function by a double power 
law: 



n{L)A(L) oc (L/L*) a (l + (L/L*) 7 ) 



y\(<5-ff)/y 



d(L/L*), 



(5) 



where a is the exponent at low luminosities (L/U) <K 1, 6 the 
exponent at high luminosities (L/L*) » 1 , y a parameter that de- 
termines the speed of the transition between the two power laws, 
and L* the characteristic luminosity of the transition. This form 
represents the bright-magnitude end of the luminosity fun ction 
better than the usual Schechter function (Tempel et alj|2009l) . 

2.2. Luminosity corrections for the LRG sample 

Although the lumin osity function of the SDSS LRGs has already 
been determined (Wak e et al.l 120061) . it is difficult to calculate 
the luminosity weights for LRGs as we did above for the main 
sample. The reason is simple - the LRG sample does not have 
the two magnitude limits. Because of that, we find the observed 
comoving luminosity density t{d) and defined the luminosity 
weight as its inverse: 



W L (d) = €(d )/((d), 



(6) 



where do is the fiducial comoving distance (taken as 435.6 
/z _1 Mpc, see Sect.EHt. 

Both these luminosity correction schemes (for the main and 
LRG samples) add luminosity to the observed galaxy locations, 
and cannot restore the real, unobserved galaxies. This evidently 
increases the shot noise at distances, but that is unavoidable. 



2.3. Calculation of the luminosity density field 

We describe the mathematical details for calculations for the lu- 
minosity density field in AppendixlAl here we give a brief sum- 
mary of the procedure. We denote the luminosity density field on 
a grid with (■„ where i = h) are the indices of the vertices. 

The luminosity densities are calculated by a kernel sum: 



gal 



(7) 



where L ga i jW is the weighted galaxy luminosity, and a the kernel 
scale. 

We use the B3 spline kernel Bj,(x/a) (see Appendix [A} 
to construct the one-dimensional kernel K^(x/a), and form 
the three-dimensional kernel as a direct product of three one- 
dimensional kernels. The scale a can be regarded as the effective 
radius of the kernel, and its choice is determined by the applica- 
tion. 

As the last step before extracting superclusters, we convert 
densities into the units of mean density. The main purpose of this 
is to facilitate comparison between different density fields. For 
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that, we construct a pixel mask that follows the sample edges. 
We determine the mean density as an average over all vertices 
inside the mask, 



mask . , 



(8) 



where A^ ma sk is the number of grid vertices inside the mask. We 
finally normalise the density field as 



A 



(9) 



for all vertices with coordinates i inside the mask. The vertices 
outside the mask are not used again. 

We find the variances <x 2 of the density field estimates for 
all vertices by smoothed bootstrap, as described in Appendix iBl 
Using that, we calculate for every grid vertex the signal-to-noise 
ratio 



Gi 



07i 



(10) 



It is used later to estimate confidence levels of superclusters. The 
parameters and properties of the luminosity density fields are 
described in Sect. 14. II 



2.4. Assembly of superclusters 

We define our superclusters using the luminosity density field. A 
conventional way is to choose a density level and to define super- 
clusters as connected density regions abov e that level (see, e.g., 
Eina sto et al.l 120071 iLuparello et all 1201 ll) . For different tasks, 
these levels are chosen differently. Because of that, we create sets 
of contour surfaces for different density thresholds D n , sampling 
the density range from D m \ n to D max with a constant increment 
SD. 

We use density peaks to identify superclusters (density field 
objects). Contiguous supercluster regions are grown pixel-wise 
around the peaks in the density field resulting in a marker field 



M nA = ID peak , i € {i|A > A), 



(11) 



where ID pea k is the density peak number. All the vertices belong- 
ing to an object are assigned the same mark value. 

We start scanning the field at high densities and move on to 
lower density levels. Each time an object first appears, it is as- 
signed a unique identification number that will be used for this 
supercluster throughout the catalogue. We keep track when an 
object emerges from the field and how or if it is eventually swal- 
lowed up by another density field object. If such a merger occurs, 
the identifier of the object with the higher peak value will be used 
to designate that object later on. To record the merging history 
of the density field objects, we order them into a tree structure 
encompassing all the density thresholds. 

We finally assemble superclusters by distributing galaxies 
among the density field objects. We do this for each density 
threshold by correlating galaxy positions with the corresponding 
marker field. For the SDSS main sample we also assign galaxy 
groups to superclusters. If a group or a cluster is found to be in 
a supercluster (its centre is located inside the supercluster con- 
tour), all its member galaxies automatically also belong to the 
same supercluster. We also implement a lower limit of (a/2) 3 for 
a volume of a supercluster, where a is the smoothing scale, in 
order to remove small spurious density field objects that include 
no galaxies. 



2.5. Selection of density thresholds 

With the multitude of available thresholds comes the question - 
which is the "correct" one? Just as there is no clear-cut defini- 
tion for superclusters, there is also no single answer for this. We 
offer two possibilities for tackling this problem. The first one is 
the conventional way of choosing a fixed density level, as done 
above. This gives a set of objects that are comparable within 
the whole sample volume, where the density level D„ can be se- 
lected according to the properties of superclusters one wishes to 
study. As an example, for identifying structures, low density lev- 
els are better, but for studying the details of the structure, higher 
levels are useful; and sometimes it is necessary to use a set of lu- 
minosity levels. Exa mples include the density level 5 .0 us ed by 
lEinasto et ail d201 ll). leve l 4.6 used bv lEinasto etaf] (120071). 5.5 
in ILuparello et al.l (1201 lb . and the set of levels in iLietzen et al.l 
(120091) . However, this approach is susceptible to Poisson noise, 
especially in sparser environments. It also does not take the rich- 
ness differences of superclusters into account. We demonstrate 
both effects in Sect. 1431 



Because of that, we offer an alternative procedure that as- 
signs an individual threshold to each supercluster, adapting to the 
local density level. The idea is to follow the growth of individ- 
ual superclusters from a compact volume around its centre, by 
lowering the density level and observing the supercluster merg- 
ers. By defining a supercluster as the volume within the density 
contour until the first merger, we can break the large-scale struc- 
ture into a collection of compact components. Every component 
(supercluster) then has its own limiting density level Ad, as is 
usual for other astronomical objects. We do not define galaxies 
by a common limiting stellar density level. As a result, we get a 
set of superclusters that forms the connected large-scale cosmic 
web. 

To identify such superclusters in practise, it is easier to be- 
gin from lower densities and to proceed upwards. The mergers 
can now be seen as breakups of structures. We trace the splitting 
events in the density field objects tree. With a split a lower den- 
sity filament ceases to be a "bridge" between two higher density 
regions. We pick the density value just above of the bridge, after 
the split, as the defining density level for these two objects. If 
one of these objects is broken up again at some higher threshold, 
it will not affect the other one. 

As a downside this technique still requires manually setting 
several limits. First, the minimal size of a supercluster must 
be selected, for obviously some of the breaks involve objects 
that are too small to be of interest. In pre vious studies , a 10 
(/z -1 Mpc) 3 lower volume limit was used by Eina sto et al.l (120071) . 
Costa-Duarte et a use ten galaxies as a minimum for 

their superc lusters (in comb i nation with the volume limit of 64 
(/z ^Mpc) 3 ). ILuparello et al] d201 ll) use the object luminosity of 
10 12 Lq as the lower limit. In this study we use the diameter of 
the supercluster. 

We must also choose the maximum threshold Aim- While we 
observe that most of the superclusters are defined at similar den- 
sity levels, some very rich clusters with their surroundings can 
satisfy the minimum size condition at a much higher level and 
the algorithm may break up well-established structures (we dis- 
cuss these differences in Sect. I4.31 >. Because of that, we proceed 
in two steps. First we find the thresholds for all objects, and we 
find the maximum threshold Aim for superclusters as the density 
level where 95% of objects have a lower threshold. Then we re- 
calculate the thresholds but prohibit splitting of structures above 
that threshold. 
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The natural lower density limit is the percolation density 
level. Percolation happens when the largest structure starts to 
fill the sample volume. In practise we define the percolation 
level D pelc as the density when the richness of the second rich- 
est structure starts to de crease when lowering the density level 
dMartfnez & Saaril2002l) . 

Shifting the maximum density threshold upwards will frag- 
ment structures further. Reducing the minimum size of a super- 
cluster will have the same effect and will also increase the num- 
ber of small objects. 

We present the SDSS main and LRG supercluster catalogues 
in two versions, one with a set of fixed levels and the second 
with adaptive density thresholds. We describe the differences of 
these catalogues in more detail in Sect. 14.31 

2.6. Supercluster properties 

After delineating superclusters as described in previous sections, 
we calculate a number of supercluster properties for all density 
levels using both the density field and the galaxy data. In the 
following we describe the calculation of the most important at- 
tributes of superclusters that will be included in the catalogues. 
The initial density peak, from which the supercluster grew and 
which usually indicates the presence of a large galaxy cluster, 
marks the supercluster position. 

The supercluster volume is found from the density field as 
the number of connected grid cells multiplied by the cell volume: 

Vscl = WcellesclA 3 , (12) 

where A is the grid cell length. We find also the sum of nor- 
malised densities at the grid vertices within the supercluster 

iescl 

for an estimate of the total luminosity of the supercluster. 

Using galaxy luminosities, we obtain two more estimates for 
the total luminosity of the supercluster, the sum of the observed 
galaxy luminosities, and the sum of the weighted galaxy lumi- 
nosities: 

^scl.gal = ^S al ' (1^) 

galescl 

^scl,wgal = ^ ] ^gal.w- (15) 
galescl 

We find the number of galaxies, iV ga i and, if available, the 
number of galaxy groups and clusters N gT in a supercluster. We 
define the supercluster diameter © as the maximum distance be- 
tween its galaxies. Using the galaxy locations and their weighted 
luminosities, we find the supercluster centre of mass (luminos- 
ity): 

WLidgJLga. (16) 

^scl.wgal gal£sd 

Among these quantities, the most important are the super- 
cluster diameter and the weighted total luminosity, because they 
are least affected by the distance to the supercluster. We assume 
we have restored the real total luminosities by weighting the 
galaxies, and while we may lose dim galaxies, t he brighter one s 
still mark the supercluster region sufficiently (Tem pe"fll201ll) . 
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Fig. 1. The sky projection of the DR7 galaxies and the mask in the 
SDSS rj and A survey coordinates. 

Also, as we show later, neither of these properties is very sen- 
sitive to the choice of the density threshold. Because of the dif- 
ferent weighting in the LRG sample, the weighted luminosity 
there cannot be used as an approximation to the total luminosity, 
as for the main sample. 

We identify a supercluster by a "marker galaxy" that we arbi- 
trarily choose to be a bright galaxy near the highest density peak 
in the supercluster volume. The aim of this is to tie a superclus- 
ter to an observational object and to construct an identifier that 
is not specific to the current catalogue. The long identification 
number is given in the format of AAA + BBB + CCCC, where 
AAA and BBB are the integer parts of the equatorial coordinates 
a and 6 of the marker galaxy and CCCC, its redshift multiplied 
by 1000. 

We check whether a supercluster is in contact with the 
mask edge. A location near the sample boundary implies in- 
completeness of the supercluster, and its parameters may not 
reliable. Using the signal-to-noise field G (Eq. ITOb . (see also 
Appendix [B), we calculate for each supercluster a confidence 
estimate 

C sd = -J- V G(r gal ). (17) 

Waal ^ — ' 
g< " galescl 

We interpolate the signal-to-noise ratio values of the density es- 
timate to the galaxy locations and find the average over all galax- 
ies in the supercluster. An extended description of supercluster 
properties in the catalogue is given in AppendixICl 

3. Galaxy and group data 

We constructed catalogues for both the SDSS main and LRG 
samples. The main sample has a high spatial density and allows 
to follow the superclusters in detail, but the LRG sample, al- 
though sparse, is much deeper. 

3.1. The SDSS main sample 

Our main galaxy sample is the main sa mple from the 7th dat a 
release of the Sloan Digital Sky Survey (Abazai ian et al.l l2009). 
We used the data from the contiguous 7646 square degree area 
in the North Galactic Cap, the so-called Legacy Survey (Fig.[TJ. 
The sample selection is described in detail in the SDSS DR7 
group catalogue paper by iTago et alJ (2010). We used galaxies 
with the apparent r magnitudes 12.5 < m r < 17.77 and ex- 
cluded duplicate entries. We corrected the redshifts of galaxies 
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Fig. 2. Average normalised densities vs distance for the main (upper 
panel) and LRG samples (lower panel). The densities are averaged over 
thin (a few /r'Mpc) concentric shells of the distance d. Solid line - 
the weighted luminosity density; dashed line - the observed luminosity 
density; dotted line - the galaxy number density. 
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Fig. 3. Distance-dependent weights for the main sample galaxies. 



for the motion relative to the CMB and computed comoving dis- 
tances of galaxies using the standard cosmological parameters: 
the Hubble parameter Hq = 100/; km s Mpc 1 , the matter den- 
sity parameter Q m = 0.27, and the dark energy density parameter 
Qa = 0.73. 

We calculated the absolute magnitudes of galaxies in the r- 
band as M r = m r - 25 - log 10 d^ - K, where m r is the Galactic 
extinction corrected apparent magnitude, di = d(\ + z) is the 
luminosity distance (d is the comoving distance) in ft -1 Mpc and 
z the redshift, and K is the k + e correction. The ^-correction 
for the SD SS galaxies was calculated using the KC ORRECT al- 
gorithm dBlanton et alJl2003at iBlanton & Roweisll20071) . In ad- 
dition, we corrected the magnitud es for evolution, using the lu- 
minosity evolution model of Blan ton et alj d2003bl) . The magni- 
tudes correspond to the restframe (at the redshift z = 0). 

Groups and clusters of galaxies were determined using a 
modified friends-of-friends algorithm, in which a galaxy be- 
longs to a group of galaxies if this galaxy has at least one group 
member galaxy closer than the linking length. To take selec- 



tion effects into account when constructing a group catalogue 
from a flux-limited sample, we increased the linking length with 
distance, calibr ating the scaling by shifting nearby groups (see 
Tago et aDl2010l for details). As a result, the sizes and velocity 
dispersions of our groups are similar at all distances. Our SDSS 
main galaxy sample contains 583362 galaxies and 78800 galaxy 
groups and clusters. 

For the main sample, we use the apparent magnitude lim- 
its mi = 12.5 and 1112 = 17.7 for the luminosity limits h\p_ in 
Eq. (01, and calculate the dista nce-dependent weight. W e take 
M = 4.64 mag in the r-band (IBlanton & R oweis 2007) as the 
luminosity of the Sun. For the luminosity function (Eq. [5]) we 
use the following parameter values: a = -1.42 is the exponent 
at low luminosities (L/L*) <SC 1, d = -8.27 is the exponent at 
high luminosities (L/L*) » 1, y = 1.92 is a parameter that de- 
termines the speed of the transition between the two power laws, 
and L* (corresponds to M* - -21.97) is the characteristic lumi- 
nosity of the transition (Te mpel et al.ll201 Tb . Figure [2] shows the 
dependence of the galaxy number density, the observed luminos- 
ity density, and the weighted luminosity density of galaxies on 
distance. Our weighting procedure has adequately restored total 
luminosities, the luminosity density does not depend on distance. 
The luminosity weights are shown in Fig. [3] 

3.2. The SDSS LRG sample 

The galaxies for the LRG sample were selected from 
the SDSS database by an SQL query requiring that the 
PrimTarget field should be either TARGET_GALAXY_RED or 
TARGET _GALAXY_RED_II. We demanded reliable redshifts 
(SpecClass = 2 and zConf > 0.95). We kept the galaxies within 
the same mask as the main galaxies (the compact continuous 
area in the Northern Galactic Cap). We calcu lated the abso- 
lute M*(z = 0) magnitudes for the LRGs as in Eisenstein et alj 
(200l|). We examined the photometric errors of the LRGs and 
deleted the galaxies brighter than M* = -23.4 from the sample 
to keep the magnitude errors small. In total, our sample includes 
1 70423 LRGs up to the re dshift z = 0.6 (the £+e-correction table 
in Eisenst ein et alJ (1200 lh stops at this redshift). It is worth men- 
tioning that the LRG sample is approximately volume-limited 
(its number density is almost constant) between the distances 
from 400 /r'Mpc to 1000 /r'Mpc. 

We fix the fiducial comoving distance do at 435.6 /i~'Mpc 
(zo = 0.15). The gal axies closer than that are fainter and are 
"not officially" LRGs dEisenstein et al.ll2001l) . By many proper- 
ties they are yet similar to LRGs and we need these to be able to 
compare the main and LRG superclusters in the volume where 
the two galaxy samples overlap. 

3.3. The Millennium galaxy catalogue 

We chose a catalogue by iBower et alJ (|2006) that is an im- 
plementation of the Durham semi-analytic galaxy formation 
model on the Millen nium Simulation by the Virgo Consortium 
(Sprin gel et aD l2005h . The catalogue is available from the 
Millennium database at the German Astrophysical Virtual 
ObservatoryQ. A subsample of about one million galaxies was 
selected by the condition M r > -20.25. This yielded a sample 
with almost the same number density of galaxies as that of the 
SDSS main sample (from 125 to 400 Zi 'Mpc). We calculate the 
absolute luminosities for galaxies by taking M Q = 4.49 and us- 
ing the SDSS r magnitudes (Vega) presented in the catalogue. 



http : //www . g- vo . org/www/Products/MillenniumDatabases 
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Table 1. Properties of the galaxy samples and of the density fields. 
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Fig. 4. The nearest neighbour distance distribution of the SDSS main 
(upper panel) and LRG sample (lower panel) galaxies. The distribution 
is shown for various distance intervals. 
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Fig. 5. The distance dependence of the average density and of the stan- 
dard deviations for the main (upper panel) and LRG samples (lower 
panel). 



This sample serves as a volume-limited test catalogue to study 
the performance of the supercluster finding algorithm. 

4. Density fields and superclusters 

4.1. Density fields 

We chose the smoothing width of a = 8 Zi 'Mpc for the SDSS 
main sample. The choice of the kernel width is somewhat ar- 
bitrary, but an argument can be made that the scale has to cor- 
respond to the size of the structures we are searching for. For 
example, the kernel should be considerably wider than the di- 
ameters of galaxy clusters, which are a few megaparsecs. Also, 
we wish to be able to detect structures at large distances, where 
galaxies are sparser. We assume that the density field ties the 
galaxies together if these are separated by 2a. Figure|4]shows the 
nearest neighbour distributions for different distance intervals. 
We see that for the SDSS main sample the scale a = 8 /r'Mpc is 
comfortably large enough to group galaxies together even at far 
distances, and a slightly narrower kernel would also be sufficient. 
Historically, a = 8 /i^Mpc has been used in previous super- 
cluster catalogues (lEinasto et al.ll2007|) and in other supercl uster 
studies (as in a more recent paper ICosta-Duarte et al.ll201 ll) . As 



shown bv lCosta-Duarte et al.l d201 ll) . the density field method is 
actually not very sensitive to the choice of kernel width. 

We first employ the sky projection mask (Fig. [TJ used in 
iMartfnez et al.l d2009f) and then set the lower and higher limits 
for the distance. We do not need to use the more precise mask 
(e.g., the "mangle" mask provided by the NYU VAGC) because 
we are searching for structures of much larger dimensions. The 
angular diameter of the kernel at the far end of the sample is 
much larger (1 .6 degrees for the main sample and 1 .3 degrees for 
the LRGs) than these of the multitude of small holes inside the 
SDSS survey mask (with diameters less than an arcminute). The 
main sample density field mask is limited within the distances 
55 to 565 /z^Mpc. The distance limits here and also in case of 
the LRGs are chosen to avoid the distant incomplete regions. 

We chose the kernel width for the SDSS LRG sample as 
a = 16 /^'Mpc , twice the scale of kernel used for the main 
sample, since the LRG sample is sparser. Figure [4] demonstrates 
that most LRGs have at least one neighbour at distances up to 
2a = 32 /i _1 Mpc. The density field of the Millennium sample is 
calculated with a = 8 /r'Mpc kernel width. The mask is a cube 
with side length of 500 /i^'Mpc. Properties of the luminosity 
density fields for all three samples are given in Table [TJ 
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Fig. 6. A spatial slice of the luminosity density field for the main sample (left panel), the standard deviation field of the luminosity density (middle 
panel), and the signal-to-noise ratio field (right panel). The slice has a thickness of 1 /r'Mpc and is located at z = 33 /r'Mpc (Eq.|2j. 
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Fig. 7. The supercluster diameter (top row) and its weighted luminosity (bottom row) in the LRG, main, and Millennium samples (from left to 
right). Different lines correspond to different density thresholds. 



4.2. Uncertainty analysis 



Following the procedure described in Appendix [B] we created 
100 realisations of both the main and LRG samples by randomly 
shifting galaxies. The shift scale was 8 /z^'Mpc for the main and 
16 h~ l Mpc for the LRG sample. Figure|5]shows the dependence 
of <T( on distance. We can see the expected rise of 07 with a dis- 
tance that is mainly caused by the decrease in the galaxy number 
density. We also see that the absolute values of the standard de- 
viation are very low when compared to the density. This can be 
attributed to both the stability of the large-scale structures and 
the large smoothing scale for the density fields - several tens of 
galaxies contribute to the density at any point. Example maps of 
the density, standard deviation, and signal-to-noise fields spatial 
slices of the main sample are shown in Fig. [6] Looking at the im- 
ages of the standard deviation field and the signal-to-noise field 
we can relate them to the observed large-scale structure. Nearby 
peaks in the density field stand out also in the signal-to-noise 
map, but the distant peaks already drown in the noise. 



4.3. Properties of superclusters 

4.3.1 . Superclusters of the main and LRG samples. 

In this section we describe the general properties of the super- 
clusters and also compare the fixed and adaptive threshold cata- 
logues. We chose the density difference between the thresholds 
as SD = 0.1 (in the units of the mean density). We compare the 
diameters and the total weighted luminosities. At this stage we 
do not limit our object sample in any way - it also contains small 
objects that may consist of only one galaxy and superclusters lo- 
cated close to the boundary. 

Figure [7] shows the diameter and weighted luminosity distri- 
butions for the main and LRG samples for different density lev- 
els. The shapes of the curves are similar in both samples, with 
the diameter distribution offering a slightly clearer picture. For 
all density thresholds, the maximum of the curves is located at 
about the same diameter/luminosity value. The slight dip in the 
distributions at small and dim objects is caused by our not in- 
cluding any density field objects that have no galaxies inside. 
At the high diameter/luminosity wings, the distributions have a 
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D„ = 4.4; SCI 578 -D a = 4.3; SCI 1310 -D a = 3.7. 



series of maxima, which are characteristic of all density thresh- 
olds. This is caused by structures that are distinctively larger 
than most of the objects, and they are present even at high den- 
sity levels (D = 8.0). As we move towards lower density levels 
(D = 6.0), the number of objects increases, while the objects 
themselves also get larger. The maxima caused by very large su- 
perclusters also become more prominent and at some point sep- 
arate from the main body of the distribution, as they begin to 
include increasing numbers of smaller objects (D = 4.0). At the 
lowest density threshold, below percolation, there is one enor- 
mous structure that extends throughout the sample volume (at 
D = 2.0). 

In Fig. [7] the distributions for the adaptive catalogues start at 
the minimum distance limit. They have higher values than the 
distributions for the fixed level superclusters because they in- 
clude contributions from superclusters at several density thresh- 
olds. 

Figure [8] presents an example of how the superclusters are 
affected by the two selection methods, fixed or adaptive den- 
sity thresholds. The most noticeable consequence is that the su- 
perclusters SCI 64 and SCI 94 in the upper panel (a fixed den- 
sity level) have both been broken in two and all their compo- 
nents have thresholds higher than before, while the superclus- 
ters SCI 1, 362, 578 have been defined at lower density levels 
and SCI 1310 has been assigned a much lower density level and 
is considerably larger because of that. The supercluster SCI 1320 
does not meet the minimum diameter criterion (® > 16/r'Mpc) 
and is not included in the catalogue with adaptive thresholds. If 
an object fails to qualify as a supercluster, it does not necessarily 



mean that galaxies belonging to this object are absent from the 
catalogue; instead of that, they can belong to some other super- 
cluster at a lower threshold. This depends on the specific geom- 
etry of the supercluster environment. 

We have to check whether the supercluster properties de- 
pend on distance and how the different density level assignments 
work. Figure [9] shows the dependence of the diameter and the 
weighted luminosity on distance. The extent of the scatter of di- 
ameters and luminosities increases with distance, while the av- 
erages remain more or less stable, and the standard deviations 
also do not exhibit systematic increase or decrease with the dis- 
tance. The average diameter is almost constant for both the main 
and LRG samples. The barely noticeable downward trend in the 
fixed threshold supercluster catalogue is caused by small galaxy 
groups or even single galaxies, which are bright but do not form 
larger structures because of the sparseness of the galaxy sam- 
ple. The weighted luminosity, however, tends to rise slightly for 
the main sample, and in a quite obvious manner for the LRG 
sample. Together, these graphs suggest that superclusters with 
similar dimensions are brighter at large distances, which implies 
some overweighting. 

Figure [10] shows the dependence of the supercluster confi- 
dence estimates on supercluster richness (its number of galax- 
ies) and distance. The confidence estimates are calculated as in 
Eq. ( IB. 21 ). Both graphs display the expected behaviour. The con- 
fidence estimates diminish with distance, and richer superclus- 
ters also have higher signal-to-noise ratios. This property can be 
used to select objects for further studies. Predictably, the confi- 
dence estimates for superclusters in the LRG sample are signif- 
icantly lower. The confidence estimates depend on the density 
threshold, but at lower density levels, more galaxies from the 
density field regions with higher variance are included. Because 
of that, fixed threshold superclusters have higher confidence es- 
timates in Fig.fTOl 

Next we take a look at structure breakups and adaptively as- 
signed supercluster thresholds. Figure QT| shows the number of 
splitting events, the percolation level, and the 95% limiting den- 
sity threshold for D sc i. Figure [12] gives an example of how su- 
percluster diameters and luminosities change drastically during 
mergers when lowering the density level and, while still grow- 
ing, remain relatively stable in between. Supercluster SCI 24 in 
Fig. [T2]is a part of the Sloan Great Wall and at den sities D < 4.7 
it actu ally includes all of the SGW superclusters dEinasto et alJ 
2010). For the main sample, density threshold assignments do 
not show any clear dependence on distance, while for the LRG 
sample the adaptively found levels are increasing with distance 
(Fig. [TBI. The broad peaks, which are visible in Figs. [9] E] and 
IT4lat approximately 250 /z^'Mpc are caused by the Sloan Great 
Wall region superclusters. 

Selection effects can cause the number density of superclus- 
ters to depend on distance. Figure [14] demonstrates that using a 
single density level to define superclusters causes a significant 
rise in the number of objects with distance. The reason for this 
is the Poisson noise that is caused by the increased density con- 
trast because of the weighting. As mentioned before, we can add 
the missing luminosity only where we see the galaxies, but not 
there where it is actually missing. In contrast, the number den- 
sity of adaptive threshold catalogue superclusters is independent 
of distance. 

4.3.2. Superclusters of the Millennium sample 

We used the Millennium galaxy sample to evaluate the 
supercluster-finding procedure as applied to an ideal volume- 
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Fig. 9. Supercluster diameters {left panels) and total weighted luminosities {right panels) vs distance for fixed and adaptive thresholds. The main 
sample is shown on the upper row and LRG on the lower. Points mark the diameters and luminosities of individual superclusters; the line is the 
average with the bin widths of 10 /r'Mpc for the main and 25 /j~'Mpc for the LRG sample. The standard deviations in bins are shown with grey 
contours. 
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limited sample. As a consequence, there are no distance-related 
effects. A separate further study will look more closely at the 
differences in object selection and their properties using several 
simulated flux and volume-limited samples. 

Figure|7]allows us to compare the supercluster diameter and 
weighted luminosity distributions for the main sample to those 
for the Millennium sample. The distributions are strikingly sim- 
ilar, with the main sample containing almost the same amount 
of superclusters. The shape of the Millennium distributions at 
the density level D = 8.0 also indicates the presence of large 



Sloan Great Wall-like structures. Still, in contrast to the main 
sample, at the lowest density threshold there are also other large 
structures besides the one that has percolated. Also, the adaptive 
level superclusters appear to be slightly smaller than those in 
the main sample, which does not contradict earlier similar find- 
ings (Ein asto et al.l l2006). The number of structure splits versus 
the density threshold graphs for the Millennium and the main 
sample are virtually indistinguishable. They also share the per- 
colation threshold and the 95% maximum density level differs 
by only one 6D (Fig.fTTTi. 

The summary of the properties of the supercluster catalogues 
for all three samples is given in Table[2] Both the main and LRG 
samples have most superclusters at the same threshold D = 3.0, 
with 1566 and 4780 objects, accordingly (the volume of the LRG 
sample is about 14 times larger than that of the main sample). 
There is only one major difference with the Millennium sam- 
ple: it has most superclusters, 1316 at a slightly higher threshold 
D = 3.3, than the observational samples (D = 3.0). We find that 
significantly more galaxies belong to superclusters in the adap- 
tive catalogue. For main and Millennium samples, the percent- 
age rises from about 15% to more than a quarter of the galaxies, 
and in the LRG sample about 80% of the galaxies belong to su- 
perclusters in the adaptive threshold catalogue. 

Comparison with the volume-limited Millennium sample 
shows that our supercluster algorithms generally work well 
and that we have avoided the selection problems inherent to 
magnitude-limited samples. 



4.3.3. Large-scale variations in the SDSS main sample 

If we look at the positions and density levels of the adaptive- 
threshold supercluster map of the SDSS main sample (Fig. [15}, 
we see that there are strong variations in the supercluster thresh- 
olds depending on the region where they are located. The thresh- 
old level needed to define a supercluster is tightly correlated with 
the overall mean density. The spatial scale of these variations is 
about 200-300 Zi 'Mpc. One can discern the dominant super- 
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also shown. The lines begin at the level where the object separates from 
the larger structure. 
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Fig. 13. Adaptively assigned supercluster thresholds, the average and 
the standard deviation vs distance. 



cluster plane (Eina sto et al.ll 1997[) and syste m of large voids be- 
hind it (described in detail by iPlatenl 12009). The fact that these 
variations are not lost in the projection (we show the 2-D projec- 
tion of the full Legacy volume) shows that they are really huge. 
The reason for these variations is presently unclear, so we leave 
their quantification and study for the future. 



5. Conclusions and discussion 

Superclusters are the elements of the overall large-scale struc- 
ture, the "LEGO pieces" of the Universe. As such, they describe 
the whole cosmic web of galaxies. They are also the largest 
objects of that web, and, although they are not gravitationally 
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Fig. 14. The dependence of number densities on distance for adaptive 
and fixed threshold superclusters. The main sample is shown in the up- 
per panel and the LRG sample in the lower panel. 



bound, in the future they may bec ome bound isolate d structures, 
the real "island Universes" ( Araya-Melo et al. 2009). 

Developing supercluster catalogues is useful for future, and 
sometime s unexpected, appli cations. The study of quasar envi- 
ronments (iLietzen et al.l2009l) is a natural application of the mul- 
tilevel supercluster catalogue, by aiding the uniform description 
of the overall matter density field. Searches for specific direc- 
tions that are promising for observations is another example of 
where supercluster catalogues are indispensable; for example, a 
search for the elusive warm-hot intergalactic medium (WHIM) 
can be more effective with prior knowledge of the structures tha t 
are theoretically associated with the WHIM dFang et al.l l2010). 
And the identification of the Planck SZ source, mentioned in 
the introduction, is a perfect example of an unexpected develop- 
ment. 

The main result of this work is a set of supercluster cata- 
logues, based on the SDSS DR7 galaxy data, for the main and 
the LRG samples. The catalogues are public. We define super- 
clusters, first for different mean density thresholds, and then for 
adaptive density thresholds that are different for each superclus- 
ter. 
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Table 2. Supercluster catalogue properties. 
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Notes. Columns in the Table: 1: sample name and threshold assigning method; 2: the number of superclusters; 3: the number density of super- 
clusters; 4: the fraction of superclusters close to the sample edge; 5: the fraction of galaxies in superclusters; 6: the fixed threshold value; 7: the 
density threshold with most objects; 8: the number density of objects for the threshold D(N mllx ); 9: the maximum allowed value for D a ; 10: the 
percolation threshold; 1 1 : the minimum allowed supercluster diameter. 



It is possible to create almost selection-free samples of su- 
perclusters from flux-limited catalogues. We studied the super- 
cluster properties and found little dependence on the distance. 
We also compared the SDSS superclusters with the superclusters 
based on the Millennium galaxies, which were built using the 
same algorithms, and the supercluster samples have very similar 
properties. 

While the LRG sample is very sparse and the number den- 
sity of superclusters in its volume is much lower than for the 
main sample, one can still construct a supercluster sample with 
comparable properties. 

When previous supercluster catalogues were based on fixed 
density levels (or nearest neighbour distances), we feel that the 
multiscaling (multi-threshold) approach is essential for defin- 
ing the supercluster environment. The multi-level catalogues are 
useful for studying the overall density field, but for following in- 
dividual superclusters, their structure, and their evolution, the 
adaptive threshold algorithm produces the best superclusters. 
With the full fixed threshold supercluster data set it is possible to 
create new adaptive threshold catalogues using alternative sets of 
limiting parameters. The adaptive threshold supercluster defini- 
tion procedure permits more galaxies to be included in more su- 
perclusters, while also suppressing the selection effects. It allows 
us to generate practically volume-limited supercluster samples. 
In the LRG sample, the vast majority of galaxies are enclosed in 
superclusters. This is natural since LRGs are bright galaxies pre- 
sumably residing in the cores of large galaxy gro ups, which in 
turn a re very likely to be situated in superclusters (lEinasto et al.l 
12001 . 

Galaxy superclusters are fairly well-defined systems. With 
the increasing density level, the supercluster sizes change radi- 
cally with structure breaks, but are relatively stable in between, 
because they do not acquire or lose many galaxies while chang- 
ing the density level. An important point is that at present, the 
number of known superclusters is small (especially the number 
of very large superclusters), which makes it possible to study 
them individually by looking at every one of them and correct- 
ing the possible glitches in their delineation. 

There are certainly problems that remain unresolved at the 
moment. There is the question of boundary effects, for one using 
a fixed distance from the sample edge to limit the supercluster 
sample, as is sometimes done, is not entirely justified. First, it 
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Fig. 15. The SDSS main sample supercluster map. Different symbol 
types and sizes show the density threshold levels used to delineate the 
superclusters; blue points: D < 3.0, green squares: 4.0 < D < 4.5, 
and red rings: D > 4.5. The map is the 2-D projection of the whole 
supercluster sample. Substantial differences in the levels can be seen 
e.g., in the regions around (-60,300) and at the Sloan Great Wall region 
at (0,220). 



removes a large fraction of galaxies from the present samples. 
Second, many of t he large superclust ers (e.g., SCI 126 of the 
Sloan Great Wall (lEinasto et alJl200ll) ) are touching the SDSS 
mask edges, but are even so the largest between the superclus- 
ters. In fact, most of the nearby superclusters are incomplete be- 
cause of the cone-like shape of the survey. Thus we also included 
such superclusters, and marked if those that were affected by the 
sample borders. It is already the decision of the catalogue users 
how they take that mark into account. Superclusters from the 
LRG sample show clear selection effects at the outer border of 
the sample volume. This is caused by the low number density 
and strong luminosity weighting. An unexpected result is that 
there is an overall density variation, and the variation of super- 
cluster properties, on very large scales (about 200 /i^'Mpc), in 
the SDSS Legacy sample volume. We will discuss that in detail 
in the next paper. 
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Appendix A: Kernel density estimates 

As superclusters are searched for as regions with the luminosity 
density over a certain threshold in a compact region of space, we 
have to convert the spatial positions of galaxies into a luminosity 
density field. The standard approach is to assume a Cox model 
for the galaxy distribution, where the galaxies are distributed in 
space according to a inhomogeneous point process with the in- 
tensity p(r) determined by an underlying random field (see, e.g. 
Mart inez & Saarll2002l). The best way to es timate this intensity 
is by a kernel sum (IDavison & Hin kley 1997^ sect. 8.3.2): 



i=i 



(A.l) 



where the sum is over all /V data points, r, are the coordinates, 
K(-) is the kernel, and a the smoothing scale. As we estimate 
luminosities, we multiply kernel amplitudes by weighted galaxy 
luminosities L ga i ;W and calculate the luminosity density field as 



(A.2) 



The kernels K(-) are required to be distributions, positive ev- 
erywhere and integrating to unity; in our case, 



K(y)d 3 y=l. 



(A.3) 



Good kernels for calculating densities on a spatial grid are the 
box splines B j. They are local and they are interpolating on a 
grid: 



Y j B J (x-i) = l, 



(A.4) 



for any x and a small number of indices that give non-zero values 
for B j(x). To create our density fields we use the popular B 3 
spline function: 

\x - 2| 3 - 4\x - 1| 3 + 6M 3 - 4\x + 1 1 3 + \x + 2| 3 
B 3 (x) = — . (A.5) 

This function differs from zero only in the interval x e (-2, 2), 
meaning that the sum in ( IA.4I ) only includes values of Bt,(x) at 
four consecutive arguments x e (-2, 2) that differ by 1. In prac- 
tice, we calculate the kernel sum ( IA.3b on a grid. Let the grid 
step be A < a, and a - kA, where k > 1 is an integer. Then the 
sum over the grid 



2> 



/A 



(A.6) 
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Fig. A.l. The shape of the kernel B 3 (x). Solid line - the B 3 (x) kernel; 
dashed line - a Gaussian with cr = 0.6. 

because it consists of k groups of four values of Bt,{-) at consec- 
utive arguments, differing by 1 . Thus, the kernel 

K%\x/a;A) = -B 3 (x/a) (A.7) 
a 

differs from zero only in the interval x e [-2a, 2a] (Fig. IA.U 
and preserves the interpolation property exactly for all values of 
a and A, where the ratio a/ A is an integer (also, the error is very 
small even if this ratio is not an integer, but a is at least several 

(3) 

times larger than A). The three-dimensional kernel K B is given 
by a direct product of three one-dimensional kernels: 

K%\r/a;A) = K$\x/a; A)K^ } (y/a; A)K%\z/a; A) (A.8) 
= ||J B,(x/a)B 3 (y/a)B 3 (z/a), (A.9) 

where r = {x, y, z). Althoug h this is a direct product, it is prac- 
tically isotropic (Saar 2009). This can be seen already from the 
fact that it is very close to a Gaussian with a mean zero and 
cr = 0.6 (Fig. IA.U . and the direct product of one-dimensional 
Gaussians is exactly isotropic. 

Appendix B: Error analysis of the density field 

To characterise the errors of our density field estimates we have 
to choose the statistical model for the galaxy distribution. The 
most popular model used for the statistics of the spatial distribu- 
tion o f galaxies in the Universe is the "Poisson model" dPeeblesi 
1980), an inhomogeneous Poisson point process where the lo- 
cal intensity of the process is defined by the amplitude of the 
underlying realisation of a random field. In statistics it is called 
the Cox random proces s, se e an introduction a nd examples in 
iMartfnez & Saar! (120021) and llllian et all dl993h . In cosmology, 
the random fields used are usually Gaussian or log-Gaussian 
fields. 

As for any statistical model, it has been postulated to 
describe the galaxy distribution, and its success in applica- 
tions describing the statistical properties of that distribution 
tends to support it. For example, this model was used to de- 
velop methods f or estimating the two-point correlation function 
dHamiltonll 19931) and the p ower spectrum of the galaxy distribu- 
tion (iTegmark et al.l 1998b . These methods have been extensively 
used to study the galaxy distribution. The same model serves 
as the basis for a maximum-likelihood a pproach to recover t he 
large-scale cosmological density field bv lKitaura et al.l (1201 Oh . 

We use the kernel method to estimate the intensity of our Cox 
process. A popular procedure for estimating the uncertainties 



of kernel-based intensity estimat es for inhomogeneous Po isson 
processes is bootstrap (see, e.g.. lDavison & Hinklevl[T99~7l sect. 
8.3.2). Because the kernel used in estimating the intensity in 
Eq. IA.ll is compact, there is only a finite and, in practice, a rel- 
atively small number of members in this sum. Bootstrap is used 
to estimate the sample errors (discr eteness errors) cause d by th e 
discrete sampling. As stressed by Silver man & Youngl (Q987), 
bootstrap consists of two separate elements. Let our sample be 
(Xu . . .,X„). First, to estimate the discreteness error caused by 
the finite sample size n of the sample parameter 8(X) that we are 
estimating, we use the sampling method, drawing a large num- 
ber of samples of size n from the (integral) population distri- 
bution function F{X). Technically, it is the simplest method for 
generating random numbers with a given distribution - select n 
uniform random numbers £/, in the interval (0,1) and select the 
sample value X given by F(X) = U for each U. The other el- 
ement of bootstrap is to assume that the population distribution 
F can be approximated by the empirical distribution function 
F„ defined by all the n observed values (X\ ,.. ., X„) that form 
the sample. If all X,-s are i.i.d. (independent and identically dis- 
tributed), this function can be defined as a step function with 
increments 1/n at every X*, where (X*, . . . ,X*) is an ordered 
growing sequence of the original sample values Xj. If we select 
from F„, any bootstrap sample consists of the values of the orig- 
inal sample, selected from the origina l sample randomly with 
replacement (lEfron & Tibshiranil ll993). Using the values of 9 
for all bootstrap samples, we can find the sampling errors (usu- 
ally the bias and the variance) of the parameter 8. These are the 
bootstrap error estimates we search for. 

Theorems that prove the effectiveness of the bootstrap error 
estimates are usually proved for the case where both the sample 
size and the numb er of the bootstrap samples approach infinity 
(IShao & Tulll99l . and for a finite sample size simulations are 
used. In our case, the sample that defines the intensity estimate 
for a given point in space consists of these galaxies, the positions 
of which are within the kernel volume around that point. The 
usual rule of thumb says that bootstrap error estimates may be 
considered reliable when the sample size is more than 30 (even 
sizes as small as 14 have b een used in simulation examples , as in 
lEfron & Tibshiranil(ll993l) and lSilverman & Youngl(ll987l) ). Our 
kernel volumes include, on average, 150 galaxies in the case of 
the SDSS main sample and 25 galaxies for the LRG sample. At 
the density levels where we define our superclusters, (D « 5), the 
corresponding numbers are 750 and 125, so our error estimates 
should be reliable enough. 

For the inhomogeneous Poisson process, where the points 
X, are identically and independently distributed with the locally 
defined intensity A, bootstrap can be used to estimate the er- 
rors of the kernel estimate (Eq. IA.U . In practice, for intensity 
estimation a bootstrap version that is called a smoothed boot- 
strap is used. This is a version of parametric bootstrap, using, 
inste ad of the empirical di stribution function, its smoothed ver- 
sion. I Silverman & Youngl (1 19871) demonstrated that it is more 
effective in estimating the variance of intensity as the standard 
bootstrap. To use that, in practice, we generate bootstrap samples 
of the same size as the original sample, selecting the galaxies 
from our sample randomly with replacement, as usual in boot- 
strap, but gi ve the selected galaxies ra ndo m displacements. As 
explai ned in D avison & Hinklevld 19971) and lSilverman & Y oung 
(1987J), the random spatial displacements are required to have 
the probability density of the same form as the kernel function, 
but it is useful to undersmooth, using the kernel for the displace- 
ments that is narrower than the kernel used for calculating the 
intensity estimates. We undersmooth by a factor of two. As our 
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grid size is huge (about 10 8 ), we use 100 bootstrap samples for 
each grid vertex. This number has been found to be large enough 
to estimate the sample v ariance, based on simulation studies 
(lEfron & Tibshiranilll993l) . 

Another point that has to be taken care of when estimating 
global (population) statistics as correlation functions or power 
spectra (spectral densities) of Cox processes is to account for 
the difference of the measured statistic for the specific realisa- 
tion of the random field and for the random field as a whole (the 
so-cal le d cosmic noise problem, see, e.g., [Szapudi & Colombil 
dl996l) . iPeacockl d 19991 p. 522)). The discreteness errors, esti- 
mated by b ootstrap, an d the realisation variance combine in a 
subtle way dCohnl2006l) . In our case, fortunately, the spatial den- 
sity - the intensity of the Cox process that we are estimating (the 
geography of the large-scale structure) is exactly the underlying 
random realisation itself - we are measuring the cosmic noise, 
so are not interested in the mean density of the Universe. The 
only errors our intensity estimates have are discreteness errors, 
and these can be estimated by bootstrap. 

We select the galaxies for the bootstrap samples, together 
with their measured luminosities, and we consider galaxy dis- 
tribution as a marked Cox process, with luminosities as marks. 
If we could statistically model the luminosity distribut ion among 
galax ies as random (a random marks model, see, e.g., Illi an et alJ 
1 1993b . we could build bootstrap samples by randomly rela- 
belling galaxies, choosing their luminosities as in the usual boot- 
strap from the luminosities of the sample galaxies inside the ker- 
nel volume. This, however, would not be right, as galaxies are 
well known to be segregated by luminosity - more luminous 
galaxies populate regions of higher number density of galaxies 
dHamiltonll988tlGirardi et al.ll2.Q03b . We chose another way and 
tried modelling the luminosity errors. These consist of a small 
error of the luminosity weights, generated by the errors of the 
luminosit y function, and an er ror in modelling the evolution cor- 
rection (Blanton et al. 2003b). We tested the effect of these er- 
rors by selecting them randomly from the observed distributions, 
compared the intensity estimates with modified luminosities and 
with fixed luminosities, and found no significant differences. As 
the luminosity errors were much smaller than the deviations of 
the intensity estimates generated by bootstrap, the discreteness 
errors, and we did not find a good statistical model to describe 
them, we ignored these errors. 

After calculating the positions for the galaxies of a bootstrap 
sample, we find a new intensity estimate. We repeated the proce- 
dure a number of times (for this paper, we generated 100 boot- 
strap samples for every grid point where we estimated the in- 
tensity) and found the standard deviation for the intensity 07 for 
each grid vertex as 



07 = 



(B.l) 



m=l 



where N is the number of bootstrap realisations, (* m the intensity 
for a bootstrapped sample, and I* its mean over all realisations. 
We also found the "signal-to-noise ratio" for each grid point: 



all superclusters found at that threshold. These tables contain 
the following information (some less important properties are 
omitted here, but can be found in the readme files): 

- an unique identification number in the long and short forms; 

- the number of galaxies and groups (the latter for the main 
sample alone); 

- the supercluster volume as the number of the constituent grid 
cells times the cell volume (Eq.fT2l): 

- the supercluster luminosity as the sum of densities at grid 
vertices (Eq.fTSl; 

- the supercluster luminosity as the sum of the observed galaxy 
luminosities (Eq.[T4li; 

- the supercluster luminosity as the sum of the weighted 
galaxy luminosities (Eq.fTBIl. For the main sample superclus- 
ter catalogue, we consider this as the best estimate of the total 
luminosity of the supercluster; 

- the maximum density in the supercluster; 

- the equatorial coordinates (J2000 here and hereafter) and the 
comoving distance of the highest density peak; 

- the equatorial coordinates and the comoving distance of the 
centre of mass (Eq.fToTl; 

- the cartesian coordinates (Eq. |2]i of the highest peak and of 
the centre of mass; 

- the supercluster diameter as the maximum distance between 
the galaxies in the supercluster; 

- the identifier of the "marker" galaxy in the Tago et al. (2010) 
catalogue; 

- the equatorial coordinates and the redshift of the "marker" 
galaxy; 

- the confidence estimate for the supercluster found from the 
signal-to-noise field G (Eq. IB.2t ; 

- shows if a supercluster is in contact with the mask boundary 
(1 - yes, - no); 

- the number of objects that will split from the supercluster 
above the current density threshold. 

A similarly structured supercluster catalogue with adaptively as- 
signed density thresholds has been compiled by combining the 
supercluster data in the tables described above. For each super- 
cluster we take the data from the fixed level catalogue that corre- 
sponds to its defining density level and add the threshold value. 

Additionally, we provide lists of galaxies and groups, to- 
gether with the supercluster identifiers they are attributed to, for 
all density levels. We also present the supercluster splitting tree 
in the form of a table, where each supercluster is given the iden- 
tifier of the object it belongs to at all given thresholds. 

As the full volume of the supercluster catalogues is very 
large, we have chosen to upload only a part of them to 
the CDS. There are the two adaptive catalogues, one for the 
main sample and the other for the LRGs, and two fixed- 
level catalogues, of D = 5.0 for the main sample and of 
D = 4.4 for the LRGs. The full catalogue is accessible at: 
http://atmos.physic.ut.ee/~juhan/super/ with a com- 
plete description in the readme files. 



G-L. 

07 



(B.2) 



Appendix C: Description of the catalogue 

The catalogue consists of several tables with some redundancies 
between them. For each density level D there exists a table with 
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